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Abstract —This paper studies two important signal processing 
aspects of equilibrium behavior in non-cooperative games arising 
in social networks, namely, reinforcement learning and detection 
of equilibrium play. The first part of the paper presents a rein¬ 
forcement learning (adaptive filtering) algorithm that facilitates 
learning an equilibrium by resorting to diffusion cooperation 
strategies in a social network. Agents form homophilic social 
groups, within which they exchange past experiences over an 
undirected graph. It is shown that, if all agents follow the 
proposed algorithm, their global behavior is attracted to the 
correlated equilibria set of the game. The second part of the paper 
provides a test to detect if the actions of agents are consistent 
with play from the equilibrium of a concave potential game. 
The theory of revealed preference from microeconomics is used 
to construct a non-parametric decision test and statistical test 
which only require the probe and associated actions of agents. 
A stochastic gradient algorithm is given to optimize the probe 
in real time to minimize the Type-II error probabilities of the 
detection test subject to specified Type-I error probability. We 
provide a real-world example using the energy market, and a 
numerical example to detect malicious agents in an online social 
network. 

Index Terms —Multi-agent signal processing, non-cooperative 
games, social networks, correlated equilibrium, diffusion co¬ 
operation, homophily behavior, revealed preferences, Afriat’s 
theorem, stochastic approximation algorithm. 


I. Introduction 

L earning, rationalizability, and equilibrium in games 
are of central importance in the analysis of social net¬ 
works. Game theory has traditionally been used in economics 
and social sciences with a focus on fully rational interac¬ 
tions where strong assumptions are made on the information 
patterns available to individual agents. In comparison, social 
networks are comprised of agents with limited cognition and 
communication capabilities, and it is the dynamic interactions 
among agents that are of interest. This, together with the 
interdependence of agents’ choices, motivates the need for 
game-theoretic learning models for agents interacting in social 
network. 

The game-theoretic notion of equilibrium describes a con¬ 
dition of global coordination where all agents are content 
with their social welfare. Reaching an equilibrium, however, 
involves a complex process of agents guessing what each other 
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Fig. 1. Two aspects of game-theoretic equilibrium behavior in social networks 
discussed in this paper. Both aspects involve multi-agent signal processing 
over a network. Stochastic approximation algorithms are used in both cases 
to devise the desired scheme. 


will do. Game-theoretic learning explains how such coordina¬ 
tion might arise as a consequence of a long-run process of 
learning from interactions and adapting behavior Q. 


A. Main Ideas and Organization 

The two aspects of equilibrium behavior in games that are 
addressed in this paper along with the main tools used are 
illustrated in Fig. [T] These two aspects are relevant to the broad 
area of machine learning of equilibria in social networks. The 
main results of this paper are summarized below: 

1) Reinforcement learning dynamics in social networks: 
The first part of this paper (Sec. |II] to Sec. addresses 
the questions: Can a social network of self-interested agents 
that possess limited sensing and communication capabilities 
reach a global equilibrium behavior in a distributed fashion? 
If so, can formation of social groups that exhibit identical 
homophilic characteristics facilitate the learning dynamics 
within the network? The main idea is to propose a diffusion 
based stochastic approximation algorithm (learning scheme) 
that if each agent deploys, the collective behavior of the social 
network converges to a correlated equilibrium. 

Sec. II-A introduces non-cooperative games with ho¬ 
mophilic social groups in a social network. HomophiljQ refers 
to a tendency of various types of individuals to exchange infor¬ 
mation with others who are similar to themselves. The detec¬ 
tion of social groups that show common behavioural character¬ 
istics can be performed using such methods as matched sample 


’In the following illustrative example is provided for homophily 
behavior in social networks: “If your friend jumped off a bridge, would you 
jump too?” A possible reasons for answering “yes” is that you are friends as a 
result of your fondness for jumping off bridges. Notice that this is different to 
contagion behavior where “your friend inspired you to jump off the bridge”. 
Due to space restrictions we do not consider contagion behavior in this paper. 
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estimation Q. Sec. |II-B| introduces correlated equilibrium Q 
as the solution concept for such games. Correlated equilibrium 
is a generalization of Nash equilibrium, however, is more 
realistic in multi-agent learning scenarios since observation 
of the past history of decisions (or their associated outcomes) 
naturally correlates agents’ future decisions. 

In Sec.|I^we present a regret-based reinforcement learning 
algorithm that, resorting to diffusion cooperation strategies Q, 
Q, implements cooperation among members of a social 
group. The proposed algorithm is based on the well-known 
regret-matching algorithm The proposed algorithnj^ 

suits the emerging information patterns in social networks, and 
allows to combine the past experiences of agents across the 
network, which facilitates the learning dynamics and enables 
agents to respond in real time to changes underlying the 
network. To the best of our knowledge, this is the hrst work 
that uses diffusion cooperation strategies to implement col¬ 
laboration in game-theoretic reinforcement learning. Sec. [rV| 
shows that if each agent individually follows the proposed 
algorithm, the experienced regret will at most be e after 
sufficient repeated plays of the game. Moreover, if all agents 
follow the proposed algorithm independently, their collective 
behavior across the network will converge to an e-distance of 
the polytope of correlated equilibria. 

2 ) Detection of equilibrium play in a social networks: The 
second part of the paper (Sec. |V] to Sec. |VI[ ) addresses the 
question: Given datasets of the external influence and actions 
of agents in a social network, is it possible to detect if the 
behavior of agents is consistent with play from the equilibrium 
of a concave potential game. The theory of revealed preference 
from microeconomics is used to construct a non-parametric 
decision test which only requires the time-series of data 
V = {{pt,Xt) : t G {1,2, ...,T}} where pt G R"* denotes 
the external influence, and Xt G R"* denotes the action of 
an agent. These questions are fundamentally different to the 
model-based theme that is widely used in the signal processing 
literature in which an objective function (typically convex) 
is proposed and then algorithms are constructed to compute 
the minimum. In contrast, the revealed preference approach 
is data-centric —we wish to determine whether the dataset is 
obtained from the interaction of utility maximizers. 

Sec. |V] introduces how revealed preferences can be used to 
detect if the actions of agents originated from play from a 
concave potential game using only the external influence and 
actions of the agents. Specifically in Sec. |V-A| we introduce the 
preliminary tools of revealed preference for detecting utility 


provided to illustrate the application decision test, statistical 


test, and SPSA algorithm developed in Sec. V-B to Sec. V-D 


maximization of single agents. Sec. V-B provides a non- 
parametric test to detect if the actions of agents is a result 
of play from a concave potential game. If the actions are 
measured in noise then Sec. |V-C| provides a non-parametric 
statistical test for play from a concave potential game with 
guaranteed Type-I error probability. To reduce the probability 


of Type-II errors of the statistical test Sec. V-D provides a si 


multaneous perturbation stochastic gradient (SPSA) algorithm 
to adjust the external influence in real-time. Two examples are 

^Although we use the term “algorithm,” the learning procedure mimics 
human behavior; it involves minimizing a moving average regret and random 
experimentation. 


Concave potential games are considered for the detection 
test as the detection tests for D to satisfy a Nash equilibrium 
are very weak m- In this paper the requirement of D to 
be consistent with Nash equilibrium of a concave potential 
game provides stronger restrictions when compared to only a 
Nash equilibrium while still encompassing a large set of utility 
functions. 

An interesting aspect of both parts of this paper is the 
ordinal nature of decision making. In the learning dynamics of 
Sec. [TO the actions taken are ordinal; in the data-set parsing of 
Sec.|^ the utility function obtained is ordinal. Humans make 
ordinal decision^ since humans tend to think in symbolic 
ordinal terms. 

B. Literature 

Game theoretic models for social networks have been stud¬ 
ied widely G). GD- For example, HD formulate graphical 
games model where each agent’s influence is restricted to its 
immediate neighbors. The reader is referred to 0 for an 
treatment of interactive sensing and decision making in social 
networks from the signal processing perspective. 

Reinforcement Learning Dynamics: Regret-matching Q- 
Q is known to guarantee convergence to the set of correlated 
equilibria Q under no structural assumptions on the game 
model. The correlated equilibrium arguably provides a natural 
way to capture conformity to social norms |T5). It can be 
interpreted as a mediator 0 instructing people to take actions 
according to some commonly known probability distribution. 
The regret-based adaptive procedure in Q assumes a fully 
connected network topology, whereas the regret-based rein¬ 
forcement learning algorithm in Q assumes a set of isolated 
agents who neither know the game, nor share information of 
their past decisions with others. In 0^ a regret-matching 
algorithm is developed when agents exchange information over 
a non-degenerate network connectivity graph. 

The cooperation strategies in adaptive networked systems 
can be classified as: (i) incremental strategies 0, (ii) con¬ 
sensus strategies |19| , and (iii) diffusion strategies ||^. In 
the first class of strategies, information is passed from one 
agent to the next over a cyclic path until all agents are 
visited. In contrast, in the latter two, cooperation is enforced 
among multiple agents rather than between two adjacent 
neighbors. Diffusion strategies are shown to outperform con¬ 
sensus strategies in pO); therefore, we concentrate on former 


to implement cooperation among social agents in Sec. Ill The 
diffusion strategies have been used previously for distributed 
estimation, distributed decision making, and distributed Pareto 
optimization in adaptive networks Q. 

Detection of Equilibrium Play: Humans can be viewed as 
social sensors that interact over a social network to provide 
information about their environment. Social sensors go beyond 

^Humans typically convert numerical attributes to ordinal scales before 
making decisions. For example, it does not matter if the cost of a meal at a 
restaurant is $200 or $205; an individual would classify this cost as “high”. 
Also credit rating agencies use ordinal symbols such as AAA, AA, A. 
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physical sensors-for example, user preferences for a particular 
movie are available on Rotten Tomatoes but are difficult to 
measure via physical sensors. Social sensors present unique 
challenges from a statistical estimation point of view. First, 
social sensors interact with and influence other social sen¬ 
sors. Second, due to privacy concerns and time-constraints, 
social sensors typically do not reveal their personal preference 
rankings between actions. In classical revealed preference 
theory in the micro-economics literature Afriat’s theorem gives 
a non-parametric finite sample test to decide if an agent’s 
actions to an external influence are consistent with utility 
maximization f2T). The revealed preference test for single 
agents has been applied to measuring the welfare effect of 
price discrimination, analyzing the relationship between prices 
of broadband Internet access and time of use service, and 
auctions for advertisement position placement on page search 
results from Google p2| . 

For interacting agents in a social network (i.e. players in a 
game), single agent tests are not suitable. Typically the study 
of interacting agents in a game require parametric assumptions 
on the form of the utility function of the agents. Deb p0| 
was the first to propose a detection test for players engaged 
in a concave potential game based on Varian’s and Afriat’s 
work IH), p^. P otential games were introduced by Monderer 
and Shapley |24) and are used extensively in the literature to 
study the strategic behaviour of utility maximization agents. 
A classical example is the congestion game p5) in which the 
utility of each agent depends on the amount of resource it 
and other agents in the social network use. Recently the anal¬ 
ysis of energy use scheduling and demand side management 
schemes in the energy market was performed using potential 
games | |26) . 

II. Learning Equilibria in Non-Cooperative Games 
With Homophilic Social Groups 

This section introduces a class of non-cooperative games 
with homophilic social groups. Homophily refers to a tendency 
of various types of individuals to associate with others who are 
similar to themselves—see footnote [T] Agents in homophilic 
relationships share common characteristics that motivates their 
communication. We then proceed to present and elaborate on a 
prominent solution concept in non-cooperative games, namely, 
correlated equilibrium. 

A. Non-Cooperative Game Model 

The standard representation of a non-cooperative game, 
known as normal form or strategic form game is comprised 
of three elements: 

1. Set of agents: K. = {I,-- - ,Ar}. Essentially, an agent 
models an entity that is entitled to making decisions. Agents 
may be people, sensors, mobile devices, etc., and are indexed 
by A: e /C. 

2. Set of actions: = {l,--- that denotes the 

actions, also referred to as pure strategies, available to agent 
k at each decision point. A generic action taken by agent k is 
denoted by a^. The actions of agents may range from deciding 


to establish or abolish links with other agents to choosing 
among different technologies p^ . 

A generic joint action profile of all agents is denoted by 

a = (a^, • • • , a^) S where A'^ = A^ x ■ ■ ■ x A^, 

and X denotes the Cartesian product. Eollowing the common 
notation in game theory, one can rearrange a as 

a = (o^, a“^), where a“^ = • • • , • • • , a^) 

denotes the action profile of all agents excluding agent k. 

3. Utility function: vf : A^ —M is bounded, and deter¬ 
mines the payoff to agent fc as a function of the action profile 
a taken by all agents. The interpretation of such a payoff is the 
aggregated rewards and costs associated with the chosen action 
as the outcome of the interaction. The payoff function can 
be quite general: It could reflect reputation or privacy, using 
the models in mg, pg, or benefits and costs associated with 
maintaining links in a social network, using the models in p7) , 
0. It could also reflect benefits of consumption and the costs 
of production, download, and upload in content production 
and sharing over peer-to-peer networks p2) , or the capacity 
available to users in communication networks | |^ . 

Throughout the paper, we restrict our attention to non- 
cooperative games in social networks in which agents have 
identical homophilic characteristics. These situations are mod¬ 
eled by a symmetric non-cooperative game in the economics 
literature, that is formally defined as follows: 

Definition 2.1: A normal-form non-cooperative game is 
symmetric if agents have identical action spaces, i.e., A^ = 
A = {I,. ■ ■, A} for all k G 1C, and for all k,l G IC: 

= v} (a*,a“^) , if a“^ = acK (1) 

Intuitively speaking, in a symmetric game, the identities of 
agents can be changed without transforming the payoffs asso¬ 
ciated with decisions. Symmetric games have been used in 
the literature to model interaction of buyers and sellers in 
the global electronic market p4) , clustering p5) , cooperative 
spectrum sensing p^ , and network formation models with 
costs for establishing links m 

Agents can further subscribe to social groups within which 
they share information about their past experiences. This is 
referred to as neighborhood monitoring HI)- The communi¬ 
cation among agents, hence, their level of “social knowledge,” 
can be captured by a connectivity graph, defined as follows: 

Definition 2.2: The connectivity graph is a simpl^ graph 
Q = {£,IC), where agents form vertices of the graph, and 

{k,l) G £ Agents k and I exchange information. 

The open and closed neighborhoods of each agent k are then, 
respectively, defined by 

■.= {lGlC-{k,l)G£}, and :=AA'=U{fc}. (2) 

Agents are, in fact, oblivious to the existence of other agents 
except their immediate neighbors on the network topology, 

simple graph is an unweighted, undirected graph containing no self 
loops or multiple edges. 
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nor are they aware of the dependence of the outcome of their 
decisions on those of other agents outside their social group. 
Besides exchanging past decisions with neighbors, agents 
realize the stream of payoffs as the outcome of their choices. 
At each time n = 1,2,..., each agent k makes a decision a^, 
and realizes her utility 

Un («„) =u (a„,a„ ) . (3) 

Here, we assume that the agents are unaware of the exact 
form of the utility functions. However, even if agent k knows 
the utility function, computing utilities is impossible as she 
observes some (but not all) elements of a“^. 

Remark 2.1: It is straightforward to generalize the game 
model described above to social networks in which agents 
form multiple homophilic groups within which each agent 
forms a social group of its own. The algorithm that we 
present next can be employed in such clustered networks 
with no further modification. However, for simplicity of the 
presentation, we continue to use the single homophilic group 
in the rest of this paper. 

B. Correlated Equilibrium 

In the first part, we focus on correlated equilibrium, which 
is defined as follows: 

Definition 2.3 (Correlated Equilibrium): Let tt denote a 
joint distribution on the joint action space A^, i.e., 

TT (a) > 0, Va e and (^) = 

The set of correlated e-equilibria Qe is the convex polytope: 
[see 0. shown at the bottom the page], where 7r^(i,a 
denotes the probability that agent k picks action i and the 
rest a“^. If e = 0, the convex poly tope represents the set of 
correlated equilibria, and is denoted by Q. 

Several reasons motivate adopting the correlated equilibrium 
in large-scale social networks. It is structurally and computa¬ 
tionally simpler than the Nash equilibrium. The coordination 
among agents in the correlated equilibrium can further lead 
to potentially higher utilities than if agents take their actions 
independently (as required by Nash equilibrium) Q. Finally, 
it is more realistic as the observation of the common history 
of actions naturally correlates agents future decisions Q. 

An intuitive interpretation of correlated equilibrium is “co¬ 
ordination in decision-making.” Suppose a mediator is observ¬ 
ing a repeated interactive decision making process among mul¬ 
tiple selfish agents. The mediator, at each period, gives private 
recommendations as what action to take to each agent. The 
recommendations are correlated as the mediator draws them 
from a joint probability distribution on the action profile of all 
agents; however, each agent is only given recommendations 
about her own decision. Each agent can freely interpret the 
recommendations and decide if to follow. A correlated equi¬ 
librium results if neither of agents wants to deviate from the 


provided recommendation. That is, in correlated equilibrium, 
agents’ decisions are coordinated as if there exists a global 
coordinating device that all agents trust to follow. 

III. Regret-Based Collaborative Decision Making 

This section presents the adaptive decision making algo¬ 
rithm that combines the regret-based reinforcement learning 
procedure Q, in the economics literature, with the diffusion 
cooperation strategies 0, 0, which has recently attracted 
much attention in the signal processing society. 

A. Agents’ Beliefs 

Time is discrete n = 1,2,.... At each time n, the 
agent makes a decision according to a decision strategy 
Pra = (Pn(l))’’’ )Pn(^)) which relies on the agent’s belief 
matrix — [r^(t,j)]. Each element r^{i,j) records the 
discounted time-averaged regrets—losses in utilities—had the 
agent selected action j every time it played action i in the 
past, and is updated via the recursive expression: [see 0 
at the bottom of the next page]. In 0, 0 < e <C 1 is 
a small parameter that represents the adaptation rate of the 
strategy update procedure, and is required when agents face a 
game where the parameters (e.g. utility functions) slowly jump 
change over time | [l7) . Eurther, I{X) denotes the indicator 
operator: I(X) = 1 if statement X is true, and 0 otherwise. 
Note that the update mechanism in 0 relies only on the 
realized utilities, defined in 0. 

Positive r^(i,j) implies the opportunity to gain by switch¬ 
ing from action i to j in future. Therefore, the regret-matching 
reinforcement learning procedure, that we present later in this 
section, assigns positive probabilities to all actions j for which 
> 0. In fact, the probabilities of switching to different 
actions are proportional to their regrets relative to the current 
action, hence the name ‘regret-matching’. 

B. Diffusion Cooperation Strategy 

Inspired by the idea of diffusion least mean squares over 
adaptive networks ©, n), we enforce cooperation among 
neighboring agents via exchanging and fusing regret informa¬ 
tion. Such diffusion of regret information is rewarding since 
agents belong to the same homophilic group—see Defini¬ 
tion |2.1| That is, all agents attempt to optimize the same 
utility function, however, in the presence of interdependence 
among their decisions. It has been shown in 0, that 
such cooperation strategies can lead to faster resolution of 
uncertainties in decentralized inference and optimization prob¬ 
lems over adaptive networks, and enable agents to respond 
in real time to changes underlying such problem. In view of 
these benefits, this paper studies, for the first time, application 
of such diffusion cooperation strategies in the game-theoretic 
reinforcement learning context in social networks. 

At the end of each decision period n, agent k shares the 
belief matrix with the neighbors on the network 


Q, = {tt : 

rn+iiiJ) = rn{i,j)-he 



.Pnij) 


[u^(j, a ^) — 
(a^) • / (a^ = 


(i, a < e, Vi, j € A^,k € /c| 
j) - ut {at) ■ I {at = i) - r*(i, j)| 


( 4 ) 


( 5 ) 
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Fig. 2. Regret-matching with diffusion cooperation. 


connectivity graph Q —see Definition 2.2 Agent k then fuses 
the collected information via a linear combiner 0, 0: 


nt 


where denotes the weight that agent k assigns to the 
regrets experienced by agent I in her immediate neighborhood 
on the network connectivity graph. These weights give rise to 
a global weight matrix W = [w;,,] in the network of agents. 
In this paper, we assume: 


W := Ik + eC, (7) 

where e is the same as the step-size in 0, and 

C = C, CIk = Ok, and 
\cki\ < 1, Cki >0 for k^ l,Cki > 0 iff {k,l) € £. 

Here, Ik denotes the K x K identity matrix, (•)' denotes the 
transpose operator, Ik and 0^ represent K x 1 vector of all 
ones and zeros, respectively. Agent k then combines the fused 
information with her own realized utility at the current period, 
and updates the decision policy for the next period. 

It is shown in p9) that, by properly rescaling the periods 
at which observation of individual agents and fusion of neigh¬ 
boring beliefs take place, the standard diffusion cooperation 
strategy in p8| can be approximated by the diffusion strategy 
with the weight matrix W in 0. This further allows using the 
well-known ordinary differential equation (ODE) method pO) , 
ED for the convergence analysis. In light of 0, the belief of 
each agent is a function of both her own past experience and 
those of neighboring agents. This enables them to respond in 
real-time to the non-stationarities underlying the game. 


Algorithm 1 Regret-Matching With Diffusion Cooperation 

Initialization: Set 

where ^min denote the upper and lower bounds on 

the utility function, respectively. Set the step-size 0 < e <C 1, 
and initialize 


P^^={1/A)-1a,R^„=0. 


Step 1: Choose Action Based on Past Regret. 


where is given in 0, and |x|^ = max{0,a:}. 

Step 2: Update Individual Regret. 


Rt+,=Rt + e 


(at' 


-rI 


( 10 ) 


where (a*) = [/ij(a^)] is an A x A matrix with elements 

/£ K) = ■ I K = j) - <K) • ^ (4 = *). 

Step 3: Fuse Regrets with Members of Social Group. 

F-n+l WklRn- ( 11 ) 

Recursion. Set n ^ n -f 1, and go Step 1. 


Step 2: The individual updates regrets based on its actions 
and the associated outcomes as X]r=i(l ~ 
where F(-) is an ordinal function of the action. The 
exponential discounting places more importance on recent 
actions. 

Step 3: The individual then shares and fuses its regrets 
with other individuals in the social group. 

Below we abstract the above social decision protocol into 
Algorithm [T] so as to facilitate analysis of the global behavior; 
see also Fig. for a schematic illustration of Algorithm [T] The 
first two steps implement the regret-matching reinforcement 
learning procedure 0, whereas the last step implements the 
diffusion protocol |[^ p8) . 


C. Regret-Matching With Diffusion Cooperation 

The proposed reinforcement learning algorithm is summa¬ 
rized in the following protocol that mimics human’s learning 
process: 

Social Decision Protocol 

Step 1: Individual k chooses action randomly from 
a weight vector (probabilities) p*. This weight vector is 
an ordinal functior0of regret due to its previous actions. 

^An ordinal function orders pairs of alternatives such that one is considered 
to be worse than, equal to, or better than the other—see item c) in Sec.|III-D| 


D. Discussion and Intuition 

Distinct properties of the local adaptation and learning 
algorithm summarized in Algorithm 0 are as follows: 

a) Decision strategy: The randomized strategy 0 is 
simply a weighted average of two probability vectors: The 
first term, with weight 1 — <5, is proportional to the positive 
part of the regrets. Taking the minimum with 1/A guarantees 
that pt is a valid probability distribution, i.e., 

The second term, with weight S, is just a uniform distribution 
over the action space 0, (nl. It forces every action to be 
played with some minimal frequency (strictly speaking, with 
probability 5/A). The exploration factor S is essential to be 




* ®n-l 

* = at-1 


s_ 

A ’ 


(9) 
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able to estimate the contingent utilities using only the realized 
utilities; it can, as well, be interpreted as exogenous statistical 
“noise.” As will be discussed later, larger 6 will lead to the 
convergence of the global behavior to a larger e-distance of 
the correlated equilibria set. 

b) Adaptive behavior: In ( [T0| , e essentially introduces 
an exponential forgetting of the experienced regrets in the 
past, and facilitates adaptivity to the evolution of the non- 
cooperative game model over time. As agents successively 
take actions, the effect of the old experiences on their current 
decisions vanishes. This enables tracking time variations on a 
timescale that is as fast as the adaptation rate of Algorithm 

c) Ordinal choice of actions: The decision strategy is 
an ordinal function of the experienced regrets. Actions are 
ordered based on the regret values with the exception that 
all actions with negative regret are considered to be equally 
desirable—see footnote |3] 

d) Inertia: The choice of guarantees that there is 
always a positive probability of picking the same action as 
the last period. Therefore, can be viewed as an “inertia” 
parameter. It mimics humnas’ decision making process and 
plays a significant role in breaking away from bad cycles. This 
inertia is, in fact, the very factor that makes convergence to the 
correlated equilibria set possible under (almost) no structural 
assumptions on the underlying game |j^. 

e) Markov chain construction: The sequence 

is a Markov chain with state space A, and transition proba¬ 
bility matrix 

P(a^ = *|a^i=j) 

The above transition matrix is continuous, irreducible and 
aperiodic for each R^. It is (conditionally) independent of 
other agents’ action profile, which may be correlated. More 
precisely, let h„ = where a,- = {a^,SLf^) denote 

the history of decisions made by all agents up to time n. Then, 

Pr(a* =& \ h„_i) 

= Pr(a^ = i I h„_i)P(a“'" = a I h„_i) (12) 

= = a I h„_i). 

The sample path of this Markov chain {a^} is fed back into the 
stochastic approximation algorithm that updates which in 
turn affects its transition matrix. This interpretation is useful in 
Sec. 1^ when deriving the limit dynamical system representing 
the behavior of Algorithm 

IV. Emergence of Rational Global Behavior 

This section characterizes the global behavior emerging 
from agents individually following Algorithm [T] 

A. Global Behavior 

The global behavior z„ of the network at each time n is 
defined as the discounted empirical frequency of joint action 
profile of all agents up to period n. Formally, 

= (1 - +£E2<r<fc(l - (13) 


where is a unit vector on the space of all possible joint 
action profiles with the element corresponding to the joint 
play a-T being equal to one. The small parameter 0 < e <C 1 
is the same as the adaptation rate in ( [TOl i. It introduces an 
exponential forgetting of the past decision profiles to enable 
adaptivity of the network behavior to the evolution of the 
game model. That is, the effect of the old game model 
on the decisions of agents vanishes as they repeatedly take 
actions. Given z„, the average utility accrued by each agent 
can be straightforwardly evaluated, hence the name global 
behavior. It is more convenient to define z„ via the stochastic 
approximation recursion 

Zn = Z™-1 +e [ea„ - Zn-l] ■ (14) 

B. Asymptotic Local and Global Behavior 

In what follows, we present the main theorem that reveals 
both the local and global behavior emerging from each agent 
individually following Algorithm [T] in a static game model. 
The regret matrices k G 1C, and z„ will be used as 
indicatives of agent’s local and global experience, respectively. 

We use stochastic averaging pTf in order to characterize the 
asymptotic behavior of AlgorithmTn The basic idea is that, via 
a ‘local’ analysis, the noise effects in the stochastic algorithm 
is averaged out so that the asymptotic behavior is determined 
by that of a ‘mean’ dynamical system. To this end, in lieu of 
working with the discrete-time iterates directly, one works with 
continuous-time interpolations of the iterates. Accordingly, 
define the piecewise constant interpolated processes 

= R^, z®(<) = Zn for t G [ne, {n + l)e). (15) 

Further, with slight abuse of notation, denote by R^ the 
regret matrix rearranged as vectors of length (A)^—^rather than 
an A X A matrix—and let represent the associated 

interpolated vector processes; see •ED- Fet further || • || denote 
the Euclidean norm, and IR_ represent the negative orthant in 
the Euclidean space of appropriate dimension. The following 
theorem characterizes the local and global behavior emergent 
from following Algorithm [T] 

Theorem 4.1: Fet be any sequence of real numbers 
satisfying —>■ oo as £ —^ 0. For each e, there exists an 
upper bound (5(e) on the exploration parameter 5 such that, if 
every agent follows Algorithm with 0 < i5 < 5(e) in (|^, as 
£ —>^ 0, the following results hold: 

(i) The regret vector R^'^f+te) converges in probability to 
an e-distance of the negative orthant. That is, for any /3 > 0, 

lim F (dist -f tg), IR_1 — e > /3) = 0 (16) 

£—>•0 ^ ^ ^ ' 

where dist[', ■] denotes the usual distance function. 

(ii) The global behavior vector z^(- + tf) converges in 
probability to the correlated e-equilibria set in the sense 
that 

dist[z®(--I-fE),Ce] = inf |jz®(--f f^) — z|| —>• 0. (17) 

Proof: See Appendix [A| for a sketch of the proof. ■ 

The above theorem simply asserts that, if an agent in¬ 
dividually follows Algorithm [T] she will experience regret 
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TABLE I 

Agents’ Payoffs in a Symmetric Non-Cooperative Game 
= 1 = 2 = 1 = 2 


(2,2,5) 

(3,6,4) 

(6,3,4) 

(4,4,6) 


(1,1,3) 

(1,4,5) 

(4,1,0) 

(6,6,4) 


= 1 a® = 2 



Fig. 3. Distance to conelated equilibrium vs. iteration number. 

of at most e after sufficient repeated plays of the game. 
Indeed, e can be made arbitrarily small by properly choosing 
the exploration parameter 6 in Q. It further states that, if 
now all agents in the networked multi-agent system start 
following Algorithm [T] independently, their collective behavior 
converges to the correlated e-equilibria set. Differently put, 
agents can coordinate their strategies in a distributed fashion 
so that the distribution of their joint behavior is close to the 
correlated equilibria polytope. From the game-theoretic point 
of view, it shows that non-fully rational local behavior of 
agents—due to utilizing a ‘better-response’ rather than a ‘best- 
response’ strategy—can lead to the manifestation of globally 
sophisticated and rational behavior at the network level. Note 
in the above theorem that the convergence arguments are to a 
set rather than a particular point in that set. 

Remark 4.1: The constant step-size in Algorithm [T] enables 
it to adapt to changes underlying the game model. Using 
weak convergence methods pTf , it can be shown that the 
first result in Theorem |4.1| holds if the parameters underlying 
the game undergo random changes on a timescale that is no 
faster than the timescale determined by the adaptation rate of 
Algorithm [T] The second result in Theorem |4.1| will further 
hold if the changes occur on a slower timescale. The reader 
is referred to tizl for further details. 


C. Numerical Example 

The limiting behavior of Algorithm [T follows a differential 
inclusion—see the proof of Theorem 4. 1| in Appendix [A] 
Differential inclusions are generalizations of ordinary differ¬ 
ential equations (ODEs) in which the sample paths belongs 
to a set; therefore, independent runs lead to different sample 
paths. This prohibits deriving an analytical rate of convergence 
for reinforcement learning algorithms of this type. Here, we 
resort to Monte Carlo simulations to illustrate and compare 
the performance of Algorithm [T] 


Consider a non-cooperative game among three agents K. = 
{1,2,3} with action set A = {1,2}. Agents 1 and 2 exhibit 
identical homophilic characteristics and, hence, form a social 
group. That is, £ = {(1, 2), (2,1)} in the network connectivity 
graph Q —see Definition |2.2| In contrast, agent 3 is isolated 
from agents 1 and 2 and, in fact, unaware of their existence. 
Table [^presents agents’ utilities in normal form: Each element 
{x, y, z) in the table represents the utility of agents 1, 2, and 3, 
respectively, corresponding to the particular choice of action. 
Note in Table |I] that the game is symmetric between agents 
1 and 2. Such situations arise in social networks when a 
homophilic group of agents aims to coordinate their decisions 
in response to the actions of other (homophilic groups of) 
agents. Further, we set 


C = 


-0.25 

0.25 


0.25 

-0.25 


in the weight matrix W, defined in 0- That is, agents 1 
and 2 place 1/4 weight on the information they receive from 
their neighbor on the connectivity graph, and 3/4 on their own 
beliefs. We further set the exploration factor 6 = 0.15 in the 
decision strategy 0, and the step-size e = 0.01 in ( [T0| ). 

As the benchmark, we use the standard reinforcement learn¬ 
ing procedure Q to evaluate the performance of Algorithm 
However, we replace its decreasing step-size with the constant 
step-size e so as to make the two algorithms both adaptive 
and comparable. In view of the last step in the proof of 
Theorem 4.1 in Appendix the distance to the polytope 
of correlated equilibrium can be evaluated by the distance of 
the agents’ regrets to the negative orthant. More precisely, we 
quantify the distance to correlated equilibrium set by 


dn = max ■ ( 18 ) 

Fig. H] shows how dn diminishes with time n for both 
algorithms. Each point of the depicted sample paths is an 
average over 100 independent runs of the algorithms. As is 
clearly evident, cooperation with neighboring agents over the 
network topology improves the rate of convergence to the 
correlated equilibrium. Algorithm [T] outperforms the reinforce¬ 
ment learning procedure 0 particularly in the initial stages of 
the learning process, where sharing experiences (regrets) with 
neighbors leads to dn monotonically decreasing with n. 


V. Detection of Equilibrium Play in Games 

We now move on to the second part of the paper, namely, us¬ 
ing the principle of revealed preferences to detect equilibrium 
play of agents in a social network. The setup is depicted in Fig. 
1^ The main questions addressed are: Is it possible to detect if 
the agents are utility maximizers? If yes, can the behavior of 
the agents be learned using the data from the social network? 
As mentioned in Sec. |I] these questions are fundamentally 
different to the model-based theme that is widely used in the 
signal processing literature in which an objective function (typ¬ 
ically convex) is proposed and then algorithms are constructed 
to compute the minimum. In contrast, the revealed preference 
approach is data-centric-we wish to determine whether the 
dataset is obtained from the interaction of utility maximizers. 
















Classical revealed preference theory seeks to determine if an 
agent is an utility maximizer subject to a budget constraint 
based on observing its actions over time and is widely studied 
in the micro-economics literature. The reader is referred to the 
works |[22) by Varian (chief economist at Google) for details. 



Fig. 4. Schematic of a social network containing n interacting agents where 
pt ^ denotes the external influence, and G the action of agent 
i in response to the external influence and other agents at time t. Note that 
dotted line denotes consumers 4,..., n — 1. The aim is to determine if the 
dataset T) = {{pt, aij, ,..., x'^) : t G {1, 2,..., T}} is consistent with 
play from a Nash equilibrium of players engaged in a concave potential game. 


A. Preliminaries: Utility Maximization and Afriat’s Theorem 

Deterministic revealed preference tests for utility maximiza¬ 
tion were pioneered by Afriat pT| , and further developed by 
Diewert El, and Varian | |23l . Given a time-series of data 
V = {{pt, xt), t G {1, 2,..., T}} where pt G K"* denotes the 
external influence, xt denotes the action of an agent, and t 
denotes the time index, is it possible to detect if the agent is 
an utility maximizerl An agent is an utility maximizer at each 
time t if for every external influence pt, the selected action Xt 
satisfies 

xt{pt) G argmaxu(x) (19) 


with u(x) a non-satiated utility function. Non-satiated means 
that an increase in any element of action x results in the 
utility function increasing. The non-satiated assumption rules 
out trivial cases such as a constant utility function which can 
be optimized by any action, and as shown by Diewert | [42) , 
without local non-satiation the maximization problem ( |l9[ ) 
may have no solution. In ( [T^ the social budget constraint 
p'tXt £ h denotes the total amount of resources available to 
the social sensor for selecting the action xt in response to 
the external influence p*. An example is the aggregate power 
consumption of agents in the energy market. The external 
influence is the cost of using a particular resource, and the 
action is the amount of resources used. The social impact 
budget is therefore the total cost of using the resources and 
is given by p'^xt = It- Further insight into the social impact 


budget constraint is provided in Sec. VI 


The celebrated “Afriat’s theorem” provides necessary and 
sufficient conditions for a finite dataset T> to have originated 
from an utility maximizer. 

Theorem 5.1 (Afriat’s Theorem): Given a dataset T) = 
{{pt,xt) : t G {1, 2,..., T}}, the following statements are 
equivalent: 


1) The agent is a utility maximizer and there exists a non- 
satiated and concave utility function that satisfies 

2) For scalars Ut and At > 0 the following set of inequalities 
has a feasible solution: 


Ur-Ut- XtPtiXr - Xt) <0 for t,T G {1,2, ... ,T}. 

( 20 ) 

3) A non-satiated and concave utility function that satisfies 
( [T^ is given by: 

u(x) = min{ut + XtPtix - Xt)} (21) 


4) The dataset V satisfies the Generalized Axiom of Re¬ 
vealed Preference (GARP), namely for any k < T, 
p'tXt > p'tXt+i Vf < fc - 1 p'^.Xk < p'kXi- ■ 
As pointed out in | |23) , a remarkable feature of Afriat’s 
theorem is that if the dataset can be rationalized by a non¬ 
trivial utility function, then it can be rationalized by a contin¬ 
uous, concave, monotonic utility function. “Put another way, 
violations of continuity, concavity, or monotonicity cannot be 
detected with only a finite number of demand observations”. 

Verifying GARP (statement 4 of Theorem |5.1| l on a dataset 
V comprising T points can be done using Warshall’s algorithm 
with 0{T^) computations. Alternatively, determining if 
Afriat’s inequalities ( [20| ) are feasible can be done via a LP 
feasibility test (using for example interior point methods ||43|). 
Note that the utility •H) is not unique and is ordinal by 
construction. Ordinal means that any monotone increasing 
transformation of the utility function will also satisfy Afriat’s 
theorem. Therefore the utility mimics the ordinal behavior of 
humans. Geometrically the estimated utility (21 1 is the lower 
envelop of a finite number of hyperplanes that is consistent 
with the dataset T). 


B. Decision Test for Nash Rationality 

We now consider a version of Afriat’s theorem for deciding 
if a dataset D from a social network is generated by agents 
playing from the equilibrium of a potential game. Potential 
games have been used in telecommunication networking for 
tasks such as routing, congestion control, power control in 
wireless networks, and peer-to-peer file sharing p4) , and 
in social networks to study the diffusion of technologies, 
advertisements, and influence H). 

Consider the social network of interconnected agents in 
Fig. g given a time-series of data from n agents D = 
{{pt,x\,...,x'’l) : t G {l,2,...,r}} with pt G R™ the 
external influence, x) the action of agent i, and t the time 
index, is it possible to detect if the dataset originated from 
agents that play a potential game? In Fig. the actions of 
agents are dependent on both the external influence pt and the 
actions of the other agents in the social network. The utility 
function of the agent now includes the actions of other agents- 
formally if there are n agents, each has a utility function 
u'(x\xf^) with X® denoting the action of agent i, Xf. * the 
actions of the other n — 1 agents, and «*(•) the utility of 
agent i. Given a dataset D, is it possible to detect if the 
data is consistent with agents playing a game and maximizing 
their individual utilities? Deb, following Varian’s and Afriat’s 
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work, shows that refutable restrictions exist for the dataset T), 


given by (22i, to satisfy Nash equilibrium 


|10|. These 


V = {{pt,xl,Xt,.. .,Xt) :t e {1,2,... ,T}}, 


u 


a concave potential function V that satisfies 

x~^) — vl{x\x~'') > 0 
iffVix\x-^)-V{x^,x-^) >0 


Just as with the utility maximization budget constraint in (19 1 , 


the budget constraint p[x'^ < II in (231 models the total 


-vt-^ XlPtixl - x\) < 0. 
2 = 1 


refutable restrictions are however, satisfied by most V 
The detection of agents engaged in a concave potential game, 
and generating actions that satisfy Nash equilibrium, provide 
stronger restrictions on the dataset V ID- We denote this 
behaviour as Nash rationality, dehned as follows: 

Definition 5.1 ( Given a dataset 


Note that if only a single agent (i.e. n = 1) is considered, 
then Theorem 15.21 is identical to Afriat’s Theorem. Similar 
to Afriat’s Theorem, the constructed concave potential func¬ 
tion ( p6| is ordinal-that is, unique up to positive monotone 
transformations. Therefore several possible options for V{-) 
exist that would produce identical preference relations to the 


actual potential function V{-). In 4) of Theorem 5.2 the hrst 


( 22 ) 


D is consistent with Nash equilibrium play if there exist utility 
functions u^{x'^,x~'^) such that 

xl = x\*{pt) S argmax vl{x'‘, x~'‘). (23) 

{p'tx'<ii} 


condition only provides necessary and sufficient conditions 
for the dataset D to be consistent with a Nash equilibrium 
of a game, therefore the second condition is required to 
ensure consistency with the other statements in the Multi-agent 
Afriat’s Theorem. The intuition that connects statements 1 and 


In (23 1 , vl{x,x *) is a non-satiated utility function in x. 


x^‘ = {x^^j^i for i,j S {1,2, ...,n}, and the elements 
of pt are strictly positive. Non-satiated means that for any 
e > 0, there exists a a;* with ||x* — a :{||2 < e such that 


'{x\x *) > *). If for all x\x^ G X\ there exists 


3 in Theorem 5.2 is provided by the following result from 
for any smooth potential game that admits a concave potential 
function V, a sequence of responses {a:*}ig{i, 2 ,...,n} ^6 
generated by a pure-strategy Nash equilibrium if and only if 
it is a maximizer of the potential function, 

Xt = {xl,Xt,...,Xt} G argmaxl/({a:*}jg{i_2,...,n}) 
s.t. p[x^<Il VzG {l,2,...,n} (27) 


(24) 


for each probe vector pt G R™. 

The non-parametric test for Nash rationality involves deter¬ 


for all the utility functions u®(-) with i G {1,2,... ,n}, then 
the dataset D satisfies Nash rationality. ■ 


amount of resources available to the agent for selecting the 
action xl to the external influence pt. 

The following theorem provides necessary and sufficient 
conditions for a dataset D ( |22] l to be consistent with Nash 
rationality (Definition |5.1| l. The proof is analogous to Afriat’s 
Theorem when the concave potential function of the game is 
differentiable H), 

Theorem 5.2 (Multi-agent Afriat’s Theorem): Given a 


mining if (25 i has a feasible solution. Computing parameters 
Vt and A{ > 0 in (25 1 involves solving a linear program 
with linear constraints in (n -\- 1)T variables, which has 
polynomial time complexi ty ||43) . In the special case of one 
agent, the constraint set in p5[) is the dual of the shortest path 
problem in network flows. The parameters ut and At in ([20 |) 
can be computed using Warshall’s algorithm with 0{T^) 

C. Statistical Test for Nash Rationality 

In real world analysis a dataset may fail the Nash rationality 


dataset V (22i, the following statements are equivalent: 

1) I? is consistent with Nash rationality (Definition |5.1[ ) for 
an n-player concave potential game. 

2) Given scalars Vt and AJ > 0 the following set of 
inequalities have a feasible solution for f, r G {1,..., T}, 


(25) 


3) A concave potential function that satisfies ( |23] l is given 
by: 

n 

V{x'^,x'^,...,x") =imn{vt-i-'^Xlp't{x"-xl)}. (26) 

i=l 

4) The dataset D satisfies the Potential Generalized Axiom 
of Revealed Preference (PGARP) if the following two 
conditions are satished. 

a) For every dataset = {{pt, xl) : t G {1,2,..., r}} 
for all i G {!,..., n] and all r G {1,...,T}, D). 
satisfies GARP. 

b) The actions xl originated from agents in a concave 

potential game. ■ 


test (25 1 as a result of the agents actions Xt being measured 
in noise. In this section a statistical test is provided to detect 
for Nash rationality when the actions are measured in noise. 

Here we consider additive noise Wt such that the measured 
dataset is given by: 

V,^, = {{pt,yl,yl...,v':)-.tG{l,2,...,T}}, (28) 

consisting of external influences pt and noisy observations of 
the agents actions yl = xl -\- w). In such cases a feasibility 
test is required to test if the clean dataset D satisfies Nash 
rationality ( [25] l. Let Hq and Hi denote the null hypothesis 
that the clean dataset D satisfies Nash rationality, and the 
alternative hypothesis that D does not satisfy Nash rationality. 
In devising a statistical test for Hq vs Hi, there are two 
possible sources of error: 

Type-I errors: Reject Hq when Hq is valid. 

Type-II errors: Accept Hq when Hq is invalid. (29) 


Given the noisy dataset Dobs ( |28l l the following statistical 
test can be used to detect if a group of agents select actions 


that satisfy Nash equilibrium (23i when playing a concave 
potential game: 


( 30 ) 
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In the statistical test ( |30| ): 

(i) 7 is the “significance level” of the statistical test. 

(ii) The “test statistic” $*{y} is the solution of the following 
constrained optimization problem for 

y = {{yhvh ■ ■ ■,yt)}te{i, 2 ,....Ty- 

min $ 

n n 

S.t. Vr-Vt-f^ KPtiyr - yl) - E < 0 (31) 

2=1 2=1 

Aj>0 $>0 for G {l,2,...,r}. 

(ill) /m is the probability density function of the random 
variable M where 


M = Max[EEi \p'M-0 

t,T 


(32) 


The following theorem characterizes the performance of the 
statistical test ((30ll. The proof is in the appendix. 


Theorem 5.3: Consider the noisy dataset I?ob.s (28 1 of exter¬ 
nal influences and actions. The probability that the statistical 
test yields a Type-I error (rejects Hq when it is true) is 
less then 7. (Recall Tfg Hi are defined in ([29ll). ■ 

Note that (31 1 is non-convex due to however, since 

the objective function is given by the scalar $, for any fixed 
value of <1>, becomes a set of linear inequalities allowing 
feasibility to be straightforwardly determined |47|. 


D. Stochastic Gradient Algorithm to Minimize Type-II Errors 

Theorem |5.3| above guarantees the probability of Type- 
I errors is less then 7 for the statistical test a for the 
detection of Nash rationality (Definition 5.1 1 from a noisy 
dataset Dobs ( |28l l. In this section, the statistical test a 
is enhanced by adaptively optimizing the external influence 
vectors p = [pi,p2, ■ ■ ■ ,pt] to reduce Type-II errors. 

Reducing the Type-II error probability can be achieved 
by dynamically optimizing the external influence p = 
[Pi,...,Pt]. The external influence p is selected as the 
solution of 

p* G argmin J(p) 


pGRV 


= p 


+00 , 

J /M(/3)d/3 > a|{p,x(p)} G yl j . (33) 


^*(y) 


Probability of Type-II error 


In (33 I, y = x(p) + w with y defined above a, /m is the 


probability density function of the random variable M 
P(- • • I’) denotes the conditional probability that ( [30| ) accepts 
Hq for all agents given that Hq is false. The set A contains all 
elements {p,x(p)}, with x(p) = [xj(pt),... ,x'^{pt)], where 
{p,x(p)} does not satisfy Nash rationality (Definition 5.1 1 . 

To compute ( p^ requires a stochastic optimization algo¬ 
rithm as the probability density functions /m are not known 
explicitly. Given that we must estimate the gradient of the 
objective function in (33 1 , and that p G can comprise 

a large dimensional matrix, the simultaneous perturbation 
stochastic gradient (SPSA) algorithm is utilized to estimate p 
from ([33| pS). The SPSA allows the gradient to be estimated 


using only two measurements of the objective function cor¬ 
rupted by noise, and for decreasing step size the algorithm with 
probability one reaches a local stationary point. The SPSA 
algorithm used to compute p is provided below. 

Step l:Choose initial probe Po = [pi,p 2 , ■ ■ ■ ,Pt] € 

Step 2:For iterations g = 1, 2, 3,... 

Estimate the cost (i.e. probability of Type-II errors) 


in (33 I using 




K 


K 


(34) 


k=l 


where I denotes the indicator function, and Fm{-) is 
an estimate of the cumulative distribution function 
of M constructing by generating random samples 
according to (32 1 . In (34i 4’*(yfe) is computing using 
(31 1 with the noisy observations yk = x(pg) + w^. 
Note that is a fixed realization of w, and the 
dataset {pq,x(pg)} G A defined below (33 1 . The 
parameter K in (j3^ controls the accuracy of the 
empirical probability of Type-II errors (jEJ- 
Compute the gradient estimate VpJq(pq): 


VpJg(Pg) = 


Jq{T>q + AgCr) - Jq{-Pq - Agg) 


Aq(i) = 


2aAq 

with probability 0.5 
with probability 0.5 


(35) 


with gradient step size a > 0. 

Update the probe vector p^ with step size e > 0; 

P 9 +I =Pq- e-VpJqiPq). 

The benefit of using the SPSA algorithm is that the estimated 
gradient S7pJq{pq) in (35 1 can be computed using only two 
measurements of the function ( [34| per iteration; see | [4^ for 
the convergence and tutorial exposition of the SPSA algorithm. 
In particular, for constant step size e, it converges weakly (in 
probability) to a local stationary point pT). 


VI. Examples of Equilibrium Play; Energy Market 
AND Detection Malicious Agents 

In this section we provide two examples of how the decision 
test ( [25] l, statistical detection test ([30| , and stochastic opti¬ 
mization algorithm ( [35] l from S ec. |V| can be applied to detect 
for Nash rationality (Definition |5.l| in a social network. The 
first example uses real-world aggregate power consumption 
data from the Ontario energy market social network. The 
second is the detection of malicious agents in an online social 
network comprised of normal agents, malicious agents, and an 
authentication agent. 


A. Nash Rationality in Ontario Electrical Energy Market 

In this section we consider the aggregate power consump¬ 
tion of different zones in the Ontario power grid. A sampling 
period of T = 79 days starting from January 2014 is used to 
generate the dataset T) for the analysis. All price and power 
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consumption data is available from the Independent Electricity 
System Operate^ (lESO) website. Each zone is considered 
as an agent in the corporate network illustrated in Fig. 
The study of corporate social networks was pioneered by 
Granovetter ED which shows that the social structure of the 
network can have important economic outcomes. Examples 
include agents choice of alliance partners, assumption of ra¬ 
tional behavior, self interest behavior, and the learning of other 
agents behavior. Here we test for rational behavior (i.e. utility 
maximization and Nash rationality), and if true then learn the 
associated behavior of the zones. This analysis provides useful 
information for constructing demand side management (DSM) 
strategies for controlling power consumption in the electricity 
market. 



Fig. 5. Schematic of the electrical distribution network in the Ontario power 
grid. The nodes 1, 2,..., 10 correspond to the distribution zones: Northwest, 
Northeast, Bruce, Southwest, Essa, Ottawa, West, Niagara, Toronto, and East. 
The dotted circles indicate zones with external interconnections-these nodes 
can import/export power to the external network which includes Manitoba, 
Quebec, Michigan, and New York. The network can be considered a corporate 
social network of chief financial officers. 


The zones power consumption is regulated by the associated 
price of electricity set by the senior management officer in 
each respective zone. Since there is a finite amount of power 
in the grid, each officer must communicate with other officers 
in the network to set the price of electricity. Here we utilize 
the aggregate power consumption from each of the n = 10 
zones in the Ontario power grid and apply the non-parametric 
tests for utility maximization ( |20| ) and Nash rationality ( [25| ) 
to detect if the zones are demand responsive. If the utility 
maximization or Nash rationality tests are satisfied, then the 
power consumption behaviour is modelled by constructing the 
associated utility function pT] ) or concave potential function 
of the game ( [2^ . 

To perform the analysis the external influence pt and action 
of agents xt must be defined. In the Ontario power grid 
the wholesale price of electricity is dependent on several 
factors such as consumer behaviour, weather, and economic 
conditions. Therefore the external influence is defined as 
Pt = [Pt(l))Pt(2)] vvith pt(l) the average electricity price 
between midnight and noon, and pt{2) as the average between 
noon and midnight with t denoting day. The action of each 
zone correspond to the total aggregate power consumption in 
each respective tie associated with pt{l) and pt(2) and is given 
by xl = [x\{l), x\{2)] with i G {1,2,..., n}. The budget 
of each zone has units of dollars as pt has units of $/kWh and 
xl units of kWh. 

We found that the aggregate consumption data of each zone 


does not satisfy utility maximization (20i. Is this a result 


^http://ieso-public.sharepoint.com/ 


of measurement noise? Assuming the power consumption of 
agents are independent and identically distributed, the central 
limit theorem suggests that the aggregate consumption of 
regions follows a zero mean normal distribution with variance 
(T^. The noise term w in (32i is given by the normal distri¬ 
bution Therefore, to test if the failure is a result of 

noise, the statistical test ( [30l l is applied for each region, and 
the noise level cr^ estimated for the dataset Vobs to satisfy 
the 7 = 95% confidence interval for utility maximization. 
The results are provided in Fig. As seen from Fig. the 


<u 

Oh O 

HJ '—' 

bX) ^ 
oj 



Zones 

Fig. 6. Average consumption (gray) and associated noise level a (black) 
for the price and demand data to satisfy utility maximization in each of the 
1,... ,10 zones in the Ontario power grid defined in Fig.|^ The average hourly 
consumption over the T = 79 days stalling from January 2014. 


Essa, West, Toronto, and East zones do not satisfy the utility 
maximization requirement. This results as the required noise 
level a for the stochastic utility maximization test to pass is too 
high compared with the average power consumption. There¬ 
fore if each zone is independently maximizing then only 60% 
of the Ontario power grid satisfies the utility maximization test. 
However it is likely that the zones are engaged in a concave 
potential game-this would not be a surprising result as network 
congestion games have been shown to reduce peak power 
demand in distributed demand management schemes 
To test if the dataset V is consistent with Nash rationality 
the detection test ( p5| ) is applied. The dataset for the power 
consumption in the Ontario power gird is consistent with 
Nash rationality. Using and ( [2^ , a concave potential 
function for the game is constructed. Using the constructed 
potential function, when do agents prefer to consume power? 
The marginal rate of substitutio^ (MRS) can be used to 
determine the preferred time for power usage. Formally, the 
MRS of a:*(l) for x*(2) is given by 


MRSi2 = 


dV/dx\l) 
dV/8x^(2)' 


From the constructed potential function we find that MRS 12 > 
1 suggesting that the agents prefer to use power in the time 
period associated with a::t(l)-that is, the agents are willing to 
give up MRS 12 kWh of power in the time period associated 
with a:* (2) for 1 additional kWh of power in time period 
associated with a:*(l). 

The analysis in this section suggests that the power con¬ 
sumption behavior of agents is consistent with players engaged 
in a concave potential game. Using the Multi-agent Afriat’s 
Theorem the agents preference for using power was estimated. 


^The amount of one good that an agent is willing to give up in exchange 
for another good while maintaining the same level of utility. 
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This information can be used to improve the DSM strategies 
to control power consumption in the electricity market. 


B. Detecting Malicious Agents in Online Social Networks 

Socialbots and spambots are autonomous programs which 
attempt to imitate human behavior and are prevalent on pop¬ 
ular social networking sites such as Facebook and Twitter. In 
this section we consider the detection of malicious agents in an 
online social network comprised of normal agents, malicious 
agents, and a network authentication agent as depicted in Fig. 
0 



Fig. 7. Schematic of an online social network with a network authentication 
agent that is able to create a fictitious agents (black) to interact with real 
agents in the social network. Two types of agents are considered: normal 
agents (white), and malicious agents (grey). The goal is for the authentication 
agent to be able to detect and eliminate malicious agents from the online 
social network. The parameters pt and actions x\ are defined in Sec. 


VI-B 


Recent techniques for detecting malicious agents in the 
social network (i.e. socialbots and spambots) use a method 
known as behavioural blacklisting which attempts to detect 
emails, tweets, friend and follower requests, and URLs which 
have originated from malicious agents m- Behavioural black¬ 
listing works as socialbots and spambots tend to have different 
behaviors then humans. For example in Twitter, socialbots and 
spambots tend to re-tweet far more then normal (i.e. human) 
agents, and by contrast normal accounts tend to receive more 
replies, mentions, and re-tweets | |5^ . The goal of malicious 
agents is to increase their connectivity in the social network 
to deliver harmful content such as viruses, gaining followers 
and friends, marketing, and political campaigning. Consider 
the network topology depicted in Fig. [7] The authentication 
agent is designed to detect and eliminate malicious agents 
in the network. To this end the authentication agent is able 
to construct fictitious accounts to study the actions of other 
agents in the network. Denoting pt G lR!p as queries for 
authentication for the m fictitious accounts produced by the 
authentication node at time t, the response of agent i is given 
by xl G E,™ and is the total number of successfully targeted 
followers and friends of each of the m fictitious accounts. 
Note that larger values of pt indicate a stronger quarry for 
authentication. We consider the following utility function for 


malicious agents: 


N{x\x-^)=\n 


x^{l)x^{2) 




\x\x-^) =r\x\x-^) + s\x^;P) 


(36) 


where x^ * G is the actions of the other {n — 1) 

agents, r represents the interdependence of the total targets, 
and s represents each agents preference to avoid detection. 
The static inaccuracy (i.e. noise-to-signal ratio) of each quarry 
authentication is contained in the elements of /3 G E™. The 
malicious social budget of each agent i is given by II. The total 
resources available to the authentication agent are limited such 
that in any operating period t the total resources available for 
queries for authentication is given by Consider 

the case with m = 2 fictitious agents. As the authentication 
agent commits larger resources to increase the queries for 
authentication pt(l), the associated number of friends and 
followers captured by the malicious agent a;t(l) decreases. 
Given that the total resources available to the authentication 
agent is limited, as Pt(l) increases Pt(2) must decrease. This 
causes an increase in the friends and followers captured by 
the malicious agent for the m = 2 fictitious agent Xt(2). 
Therefore the malicious social budget is considered to satisfy 
the linear relation II = p[x\. Malicious agents are those that 
are engaged in a concave potential game which attempt to 
maximize their respective utility function (361, and normal 


agents which have no target preference and therefore select xl 
in a uniform random fashion. At each observation t, a noisy 
measurement yl (defined in (28i) is made of the actions xl- 
Given the dataset Dobs ( |28l l, a statistical test can be used to 
detect if malicious agents are present. 


The dataset D {22 \ for malicious agents are generated by 


computing the maximum, {a;t}ig{i, 2 ,...,n}> of the concave po¬ 
tential function V — with «*(•) defined by 

( 361, for a given probe pt (refer to (p7|)). The parameter values 
for the numerical example are n = 3, m = 2, /3 = [0.03, 0.08], 
and 7 = 0.05, where n,m,/3 are defined in Sec. |VI-B| The 
malicious social budget for each of the n = 3 agents is 
generated from the normal distributions: I^ ~ Af{20, 1), if ~ 
A/’(50,1), and if ^ A/'(80,4). The queries for authentication 
are generated from the uniform distribution pt ~ 


For normal agents, D (22i is constructed from xl obtained 


from the uniform random variable xl 


50). The datasets 
Dobs (28 1 are obtained using the clean dataset D, and additive 
noise w® ~ ^(0, k) where k represents the magnitude of the 
measurement etTor. 

Fig. I^plots the estimated cost (34i versus iterates generated 
by the SPSA algorithm (35i for a = 0.1, e = 0.2, k = 
0.1, and T — 20 observations. Fig. illustrates that by 
judiciously adapting the external influence via a stochastic 
gradient algorithm, the probability of Type-II errors can be 
decreased to approximately 30% allowing the statistical test 
( [30| ) to adequately reject normal agents. 

Fig. 1^ plots the probability that a dataset Dobs ( |28| ) that 
satisfies the decision test ([25]l and statistical test (30 1 for agents 
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1,000 2,000 3,000 4,000 5,000 6,000 


Iteration 

Fig. 8. Performance of the SPSA algorithm for computing the locally 
optimal external influence p to reduce the probability of Type-II errors of the 
statistical test j30j. The parameters are defined in Sec. |VI-B| 


engaged in a concave potential game. The locally optimized 
external influence p was obtained from the results of the 
SPSA algorithm above, allowing the malicious and normal 
agents to be distinguished. As seen, the occurrence of Type-I 
errors in the statistical test is less then 5%, as expected from 
Theorem 5.3 because 7 = 5%. 



Fig. 9. Performance of the decision test j251 , and statistical test 130) for the 
detection of malicious agents and normal agents. The parameters are defined 
in Sec. rVFBl 


VII. Summary 

The unifying theme of this paper was to study equilibrium 
play in non-cooperative games amongst agents in a social net¬ 
work. The first part focused on the distributed reinforcement 
learning aspect of an equilibrium notion, namely, correlated 
equilibrium. Agents with identical homophilic characteristics 
formed social groups wherein they shared past experiences 
over the network topology graph. A reinforcement learning 
algorithm was presented that, relying on diffusion coopera¬ 
tion strategies in adaptive networks, facilitated the learning 
dynamics. It was shown that, if all agents follow the proposed 
algorithm, their global behavior of the network of agents is 
attracted to the correlated equilibria set of the game. The 
second part focused on parsing datasets from a social network 
for detecting play from the equilibrium of a concave potential 
game. A non-parametric decision test and statistical test was 
constructed to detect equilibrium play which only required 
the external influence and actions of agents. To reduce the 
probability of Type-II errors, a stochastic gradient algorithm 
was given to adapt the external influence in real time. Finally, 


we illustrated the application of the decision test, statistical 
test, and stochastic gradient algorithm in a real-world example 
using the energy market, and provided a numerical example 
to detect malicious agents in an online social network. An 
important property of both aspects considered in this paper is 
their ordinal nature, which provides a useful approximation to 
human behavior. 


Appendix A 

Sketch of the Proof of Theorem I4.1I 


The convergence analysis is based on |53| and is organized 
into three steps. For brevity and better readability, details 
for each step of the proof are omitted, however, adequate 
references are provided for the interested reader. 

Step 1: The first step uses weak convergence methods to 
characterize the limit individual behavior of agents following 
Algorithm [T] as a dynamical system represented by a differ¬ 
ential inclusion. Differential inclusions are generalizations of 
the ODEs | [54) . Below, we provide a precise definition. 

Definition A.l: A differential inclusion is a dynamical sys¬ 
tem of the form 

(37) 


where AT S K’’ and : K’’ —^ K’’ is a Marchaud map | |5^ . 
That is, i) the graph and domain of F are nonempty and 
closed; ii) the values F {X) are convex; and iii) the growth of 
F is linear: There exists C > 0 such that, for every X G K’', 


sup ||y|| ^^(I + IIAII) (38) 

reJ='(x) 

where || • || denotes any norm on K’'. 

We proceed to study the properties of the sequence {a^} 
made according to Algorithm which forms finite-state 
Markov chain—see Sec. IIII-DI Standard results on Markov 
chains show that the transition matrix (|^ admits (at least) one 
invariant measure, denoted by cr^. Then, the following lemma 
characterizes the properties of such an invariant measure. 

Lemma A.l: The invariant measure cr^{R^) of the transi¬ 
tion probabilities (|^ takes the form 

a{R^) = {l-5)xl,\R'‘) + {5/A)-\A, (39) 

where 1 / 7 ^ (i?^) satisfies 

and |a;|+ = max{a;, 0}. 

In light of the diffusion protocol agents’ successive 
decisions affect, not only their own future decision strate¬ 
gies, but also their neighbors’ policies. This suggests look¬ 
ing at the dynamics of the regret for the entire network: 
R„ := col ... ,Rn)- Using techniques from the theory 
of stochastic approximations ED’ we work with the piecewise 
constant continuous-time interpolations of R„, defined by 


R®(f) = R„ for t G [ke, {k + l)e), (41) 

to derive the limiting process associated with R„. Let AA~’^ 
represent the simplex of all probability distributions over the 
joint action profiles of all agents excluding agent k, and ® 
denote the Kronecker product. 
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Theorem A.l: Consider the interpolated process R^(-) de¬ 
fined in ( |4T] ). Then, as e —^ 0, R®(-) converges wea kijQ 
to R(-) that is a solution of the system of interconnected 
differential inclusions 

dTi 

— e H (R) -f (C - /)R, (42) 

at 

where (Z = C®Ia (see 0), A denotes the cardinality of the 
agents’ action set, and 

h,, (R) = r/ir;p-« S A^-«} , 

L = i mod A, K = + 1. 


Further, represents the stationary distribution characterized 
in Lemma lA.il 

Proof: The proof relies on stochastic averaging theory 
and is omitted for brevity. The interested reader is referred 
to 03 Appendix B] for a detailed proof. ■ 

Step 2: Next, we examine stability of the limit sys¬ 
tem and show its set of global attractors comprises an 
e-neighborhood of the negative orthant. With slight abuse of 
notation, we rearrange the elements of the global regret matrix 
R as a vector, but still denote it by R. 

Theorem A.2: Consider the limit dynamical system ( |42l l. 
Let 

Ml = |x e ; |x|+< elj . (43) 


Let further i?^(0) = i?§. Then, for each e > 0, there exists 
<5 (e) > 0 such that if (5 < 5 (e) in ( [39l l (or equivalently in the 
decision strategy (|^), the set Mi is globally asymptotically 
stable for the limit system ( |42] l. That is, 

lim dist [R(f),Ml] = 0, (44) 


where dist [•, •] denotes the usual distance function. 

Proof: The proof uses Lyapunov stability theory and is 
omitted for brevity. ■ 

Subsequently, we study asymptotic stability by looking at 
the case where £ —?► 0, n —>■ oo, and en —>■ oo. Nevertheless, 
instead of considering a two-stage limit by first letting e —0 
and then f —>■ oo, we study R®(< + tg) and require —>■ oo 
as e —0. The following corollary asserts that the results of 
Theorem A.2 also hold for the interpolated processes. 

Corollary A.l: Denote by {fe} any sequence of real num¬ 
bers satisfying oo as p, —^ 0. Suppose {R„ : e > 

0, n < oo} is tight or bounded in probability. Then, for each 
e > 0, there exists (5 (e) > 0 such that if 0 < (5 < <5(e) in (|9i, 
R-^ (• +t^) —)• Mi in probability, where Ml is defined in ( |4^ . 

The above corollary completes the proof of the first result 
in Theorem 14. 1 1 

Step 3: In the final step, we show that the convergence 
of the regrets of individual agents to an e-neighborhood of 
the negative orthant provides the necessary and sufficient 
condition for convergence of their global behavior to the 
correlated e-equilibria set. This is summarized in the following 
theorem. 


®Let Z„ and Z be E’'-valued random vectors. converges weakly to 
Z, denoted by Z„ => Z, if for any bounded and continuous function 
¥.^p{Zn) —>■ Ei/)(Z) as n —>■ oo. 


Theorem A.3: Recall the interpolated processes for the 
global regret matrix R®(-)j defined in ( |4T] l, and agent’s col¬ 
lective behavior z®(-), defined in ([BJ. Then, z'^(-) converges 
in probability to the correlated e-equilibrium if and only if 
R^(-) — 1 - Mi in probability, where Mi is defined in ( |43] ). 

Proof: The proof relies on how the regrets are defined. 
The interested reader is referred to Section IV-D] for a 
somewhat similar proof. ■ 


The above theorem, together with Corollary A. 1 completes 
the proof for the second result in Theorem |4.l] 


Appendix B 
Proof of Theorem I5.3I 


Consider a dataset 22 (22 1 that satisfies Nash rationality ( 
Given 22, the inequalities (25 i have a feasible solution. Denote 
the solution parameters of (25 i, given 22, by {AJ° 


Substituting x\ = yl — wf from (28 i, into the inequalities 


obtained from the solution of (25 i given 22, we obtain the 
inequalities: 


y; - - E - yi) ^ E ^TPtiwl - wf). (45) 

i=l 


i=l 


The goal is to compute an upper bound on the r.h.s. of ( [45] l that 
is independent of Xf. Notice that the following inequalities 
provide an upper bound on the r.h.s. of ( |45] l: 

n n 

E ^rp>i - o < E - <)i 


2=1 


2 = 1 


< 


TL n 

(E^“)(Eip*E-oi) 


' i=r 

< AfM 


i=l 


(46) 


with A( = 44 defined by (32i. Substituting 


(461 into ([45]) the following inequalities are obtained: 


- (y; - - E >'TPt{yl - vl )) < m. 

* i=l 


(47) 


A solution of ( [3T] i given Voba, defined by ( [28] l, is denoted 
by {$*{y}, A(*, V}*}. By comparing the inequalities obtained 
from the solution of •ED given 22obs, and the inequalities 
(^, notice that {^^{y} = MAT = is a 


feasible, but not necessarily optimal solution of ( |3T] i given 
22oi,s. Therefore, for 22 satisfying malicious cooperation, it 
must be the case that 4)*{y} < M. This asserts, under the 
null hypothesis Hq, that $*{y} is upper bounded by M. 
For a given 4’*{y}, the integral in (30i is the probability of 
4’*{y} A 44; therefore, the conditional probability of rejecting 
i4o when true is less then 7. ■ 


References 

[1] D. Fudenberg and D. K. Levine, “Learning and equilibrium,” Annual 
Review of Economics, vol. 1, pp. 385-420, 2009. 

[2] C. Shalizi and A. Thomas, “Homophily and contagion are generically 
confounded in observational social network studies,” Sociological Meth¬ 
ods & Research, vol. 40, no. 2, pp. 211-239, 2011. 

[3] S. Aral, L. Muchnik, and A. Sundararajan, “Distinguishing influence- 
based contagion from homophily-driven diffusion in dynamic networks,” 
Proceedings of the National Academy of Sciences, vol. 106, no. 51, pp. 
21 544-21 549, 2009. 





















15 


[4] R. J. Aumann, “Correlated equilibrium as an expression of bayesian 
rationality,” Econometrica, vol. 55, no. 1, pp. 1-18, 1987. 

[5] A. H. Sayed, “Adaptive networks,” Proc. IEEE, vol. 102, no. 4, pp. 
460^97, 2014. 

[6] -, “Adaptation, learning, and optimization over networks,” Founda¬ 

tions and Trends in Machine Learning, vol. 7, no. 4-5, pp. 311-801, 
2014. 

[7] S. Hart and A. Mas-Colell, “A simple adaptive procedure leading to 
correlated equilibrium,” Econometrica, vol. 68, no. 5, pp. 1127—1150, 
2000 . 

[8] -, “A reinforcement procedure leading to correlated equilibrium,” 

Economics Essays: A Festschrift for Werner Hildenbrand, pp. 181-200, 

2001 . 

[9] S. Hart, A. Mas-Colell, and Y. Babichenko, Simple Adaptive Strategies: 
From Regret-Matching to Uncoupled Dynamics. World Scientific 
Publishing, 2013. 

[10] R. Deb, “A testable model of consumption with externalities,” J. Econ. 
Theory, vol. 144, no. 4, pp. 1804—1816, 2009. 

[11] M. O. Jackson and A. Wolinsky, “A strategic model of social and 
economic networks,” J. Econ. theory, vol. 71, no. 1, pp. 44-74, 1996. 

[12] C. Griffin and A. Squicciarini, “Toward a game theoretic model of infor¬ 
mation release in social media with experimental results,” in Proceeding 
of the IEEE Symp. on Security and Privacy Workshops, San Francisco, 
CA, 2012, pp. 113-116. 

[13] M. Kearns, “Graphical games,” in Algorithmic Game Theory, N. Nisan, 
T. Roughgarden, E. Tardos, and V. V. Vazirani, Eds. Cambridge Univ. 
Press, 2007, vol. 3, pp. 159-180. 

[14] V. Krishnamurthy, O. N. Gharehshiran, and M. Hamdi, “Interactive 
sensing and decision making in social networks,” Foundations and 
Trends in Signal Processing, vol. 7, no. 1—2, pp. 1-196, 2014. 

[15] E. Cartwright and M. Wooders, “Correlated equilibrium, conformity 
and stereotyping in social groups,” Journal of Public Economic Theory, 
vol. 16, no. 5, pp. 743-766, 2014. 

[16] J. Xu and M. V. der Schaar, “Social norm design for information ex¬ 
change systems with limited observations,” IEEE J. Sel. Areas Commun., 
vol. 30, no. 11, pp. 2126-2135, Dec. 2012. 

[17] O. N. Gharehshiran, V. Krishnamurthy, and G. Yin, “Distributed tracking 
of correlated equilibria in regime switching noncooperative games,” 
IEEE Trans. Autom. Control, vol. 58, no. 10, pp. 2435-2450, 2013. 

[18] D. P. Bertsekas, “A new class of incremental gradient methods for least 
squares problems,” SIAM J. Optim., vol. 7, no. 4, pp. 913-926, 1997. 

[19] D. P. Bertsekas and J. N. Tsitsiklis, Parallel and Distributed Computa¬ 
tion: Numerical Methods. Singapore: Athena Scientific, 1997. 

[20] S.-Y. Tu and A. H. Sayed, “Diffusion strategies outperform consensus 
strategies for distributed estimation over adaptive networks,” IEEE 
Trans. Signal Process., vol. 60, no. 12, pp. 6217-6234, 2012. 

[21] S. Afriat, “The construction of utility functions from expenditure data,” 
Int. Econ. Rev, vol. 8, no. 1, pp. 67-77, 1967. 

[22] H. Varian, “Revealed preference and its applications,” The Economic 
Journal, vol. 122, no. 560, pp. 332-338, 2012. 

[23] -, “Non-parametric tests of consumer behaviour,” Rev. Econ. Stud., 

vol. 50, no. 1, pp. 99-110, 1983. 

[24] D. Monderer and L. Shapley, “Potential games,” Games Econ. Behav, 
vol. 14, no. 1, pp. 124-143, 1996. 

[25] R. Rosenthal, “A class of games possessing pure-strategy Nash equilib¬ 
ria,” Int. J. Game Theory, vol. 2, no. 1, pp. 65-67, 1973. 

[26] A. Chapman, G. Verbic, and D. Hill, “A healthy dose of reality for game- 
theoretic approaches to residential demand response,” in Proc. of the 
2013 IREP Symposium - Bulk Power System Dynamics and Control-IX 
Optimization, Security and Control of the Emerging Power Grid. IEEE, 
2013, pp. 1-13. 

[27] V. Bala and S. Goyal, “A noncooperative model of network formation,” 
Econometrica, vol. 68, no. 5, pp. 1181-1229, 2000. 

[28] K. Zhu and J. P. Weyant, “Strategic decisions of new technology adop¬ 
tion under asymmetric information: A game-theoretic model,” Decision 
Sciences, vol. 34, no. 4, pp. 643-675, 2003. 

[29] E. Gudes, N. Gal-Oz, and A. Grubshtein, “Methods for computing 
trust and reputation while preserving privacy,” in Data and Applications 
Security XXIII, ser. Lecture Notes in Computer Science, E. Gudes and 
J. Vaidya, Eds., 2009, vol. 5645, pp. 291-298. 

[30] L. Mui, “Computational models of trust and reputation: Agents, evolu¬ 
tionary games, and social networks,” Ph.D. dissertation, MIT, 2002. 

[31] Y. Zhang and M. V. der Schaar, “Strategic networks: Information 
dissemination and link formation among self-interested agents,” IEEE 
J. Sel. Areas Commun., vol. 31, no. 6, pp. 1115-1123, 2013. 


[32] P. Golle, K. Leyton-Brown, I. Mironov, and M. Lillibridge, “Incentives 
for sharing in peer-to-peer networks,” in Electronic Commerce, ser. 
Lecture Notes in Computer Science, L. Fiege, G. Miihl, and U. Wilhelm, 
Eds., 2001, vol. 2232, pp. 75-87. 

[33] E. A. Jorswieck, E. G. Larsson, M. Luise, H. V. Poor, and A. Leshem, 
“Game theory and the frequency selective interference channel,” IEEE 
J. Sel. Topics Signal Process., vol. 6, no. 2, pp. 73-75, 2012. 

[34] S. Ba, A. B. Whinston, and H. Zhang, “The dynamics of the electronic 
market: An evolutionary game approach,” Information Systems Frontiers, 
vol. 2, no. 1, pp. 31^0, 2000. 

[35] M. Pelillo, “What is a cluster? perspectives from game theory,” in Proc. 
of the NIPS Workshop on Clustering Theory, 2009. 

[36] B. Wang, K. J. R. Liu, and T. C. Clancy, “Evolutionary game framework 
for behavior dynamics in cooperative spectrum sensing,” in Proc. of 
IEEE GLOBECOM, New Orleans, LA, 2008, pp. 1-5. 

[37] M. Slikker and A. van den Nouweland, “Network formation models with 
costs for establishing links,” Review of Economic Design, vol. 5, no. 3, 
pp. 333-362, 2000. 

[38] C. G. Lopes and A. H. Sayed, “Diffusion least-mean squares over 
adaptive networks: Formulation and performance analysis,” IEEE Trans. 
Signal Process., vol. 56, no. 7, pp. 3122-3136, 2008. 

[39] O. N. Gharehshiran, V. Krishnamurthy, and G. Yin, “Distributed energy- 
aware diffusion least mean squares: Game-theoretic learning,” IEEE J. 
Sel. Topics Signal Process., vol. 7, no. 5, pp. 821-836, 2013. 

[40] A. Benveniste, M. Metivier, and P. P. Priouret, Adaptive Algorithms and 
Stochastic Approximations. New York, NY: Springer-Verlag, 1990. 

[41] H. J. Kushner and G. Yin, Stochastic Approximation and Recursive 
Algorithms and Applications, 2nd ed., ser. Stochastic Modeling and 
Applied Probability. New York, NY: Springer-Verlag, 2003. 

[42] W. Diewert, “Afriat’s theorem and some extensions to choice under 
uncertainty,” The Economic Journal, vol. 122, no. 560, pp. 305-331, 
2012 . 

[43] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, UK: 
Cambridge Univ. Press, 2004. 

[44] P. Maille and B. Tuffin, Telecommunication Network Economics: From 
Theory to Applications. Cambridge, UK: Cambridge Univ. Press, 2014. 

[45] N. Alon, M. Feldman, A. Procaccia, and M. Tennenholtz, “A note on 
competitive diffusion through social networks,” Information Processing 
Letters, vol. 110, no. 6, pp. 221 - 225, 2010. 

[46] W. Holies and V. Krishnamurthy, “Nonparametric demand forecasting 
and detection of demand-responsive consumers,” IEEE Trans. Smart 
Grid. 

[47] V. Krishnamurthy and W. Holies, “Afriat’s test for detecting malicious 
agents,” IEEE Signal Process. Lett, vol. 19, no. 12, pp. 801-804, 2012. 

[48] J. C. Spall, Introduction to Stochastic Search and Optimization: Estima¬ 
tion, Simulation, and Control, 2003. 

[49] M. Granovetter, “The impact of social structure on economic outcomes,” 
J. Econ. Perspect, vol. 19, no. 1, pp. 33-50, 2005. 

[50] C. Ibars, M. Navarro, and L. Giupponi, “Distributed demand manage¬ 
ment in smart grid with a congestion game,” in Proc. of the 1st IEEE 
Inti. Conf. on Smart Grid Communications, Gaithersburg, MD, 2010, 
pp. 495-500. 

[51] A. Ramachandran, N. Feamster, and S. Vempala, “Filtering spam with 
behavioral blacklisting,” in Proc. of the 14th ACM conf. on Computer 
and Communications Security. ACM, 2007, pp. 342-351. 

[52] E. Ferrara, O. Varol, C. Davis, F. Menczer, and A. Flammini, “The rise 
of social bots,” arXiv preprint, 2014. 

[53] M. Benaim, J. Hofbauer, and S. Sorin, “Stochastic approximations and 
differential inclusions, part II: applications,” Math. Open Res., vol. 31, 
no. 4, pp. 673-695, 2006. 

[54] J. P. Aubin and A. Cellina, Differential Inclusions: Set-Valued Maps and 
Viability Theory. New York, NY: Springer-Verlag, 1984. 



